Let's explore the song names in the files that are in the Wikifonia dump from 2010.
The folder wikifonia20100503 comes from a dump of the wikifonia database found here:
First, let's list all those files:
In [1]:
import glob
In [4]:
fnames = glob.glob("../MusicXML_files/wikifonia20100503/*.xml")
fnames[:10]
Out[4]:
Now, for every file, let's read it and extract the title.
In [5]:
import xml.etree.cElementTree as ET
In [12]:
tree = ET.ElementTree(file=fnames[0])
In [16]:
root = tree.getroot()
root
Out[16]:
In [17]:
root.getchildren()
Out[17]:
In [21]:
root.find('identification/creator').text
Out[21]:
In [24]:
root.find('movement-title').text
Out[24]:
Let's write a function.
In [28]:
def title_composer(fname):
"Returns title and composer name from XML filename."
root = ET.ElementTree(file=fname).getroot()
return (root.find('identification/creator').text, root.find('movement-title').text)
In [29]:
title_composer(fnames[0])
Out[29]:
Let's now run a loop over all our data:
In [31]:
metadata = [title_composer(fname) for fname in fnames]
Finally, let's build a pandas dataframe using this data:
In [32]:
import pandas as pd
In [35]:
df = pd.DataFrame(data=metadata, index=fnames, columns=('composer', 'song_title'))
df.head(10)
Out[35]:
In [48]:
df.shape
Out[48]:
We can now easily filter some of the data. For instance all titles from the Rolling Stones.
In [46]:
df[df.composer.str.contains('rolling', case=False)]
Out[46]:
In [47]:
df[df.composer.str.contains('stone', case=False)]
Out[47]:
In [49]:
df[df.composer.str.contains('keith', case=False)]
Out[49]:
In [50]:
df[df.composer.str.contains('lennon', case=False)]
Out[50]:
In [ ]: